Data Compression and Clustering: a “Blind” Approach to Classification
نویسنده
چکیده
Data Compression is today essential for a wide range of applications: for example Internet and the World Wide Web infrastructures benefits from compression. New general compression methods are always being developed, in particular those that allow indexing over compressed data or error resilience. Compression also inspires information theoretic tools for pattern discovery and classification, in particular it is possible to use data compression as a metric for clustering. This leads to a powerful clustering strategy that does not use any “semantic” information on the data to be classified but does a “blind” and effective classification that is based only on the compressibility of digital data and not on its “meaning”. Here we experiment with this strategy and show its effectiveness. Key-Words: Data Compression, Clustering, Dictionary based compression, Classification.
منابع مشابه
Clustering Digital Data by Compression: Applications to Biology and Medical Images
Data compression, data prediction, data classification, learning and data mining are all strictly related as different points of views, or instances, of the same information treatment problem. Compression inspires information theoretic tools for clustering, pattern discovery and classification. For example it has been recently proposed a new, “blind”, approach to clustering by compression that ...
متن کاملExtracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering
Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised o...
متن کاملA “ Blind ” Approach to Clustering Through Data Compression Bruno Carpentieri
Data compression, data prediction, data classification, learning and data mining are all facets of the same (multidimensional) coin. In particular it is possible to use data compression as a metric for clustering. In this paper we test a clustering method that does not rely on any knowledge or theoretical analysis of the problem domain, but it relies only on general-purpose compression techniqu...
متن کاملA Clustering Approach by SSPCO Optimization Algorithm Based on Chaotic Initial Population
Assigning a set of objects to groups such that objects in one group or cluster are more similar to each other than the other clusters’ objects is the main task of clustering analysis. SSPCO optimization algorithm is anew optimization algorithm that is inspired by the behavior of a type of bird called see-see partridge. One of the things that smart algorithms are applied to solve is the problem ...
متن کاملبه کارگیری روشهای خوشهبندی در ریزآرایه DNA
Background: Microarray DNA technology has paved the way for investigators to expressed thousands of genes in a short time. Analysis of this big amount of raw data includes normalization, clustering and classification. The present study surveys the application of clustering technique in microarray DNA analysis. Materials and methods: We analyzed data of Van’t Veer et al study dealing with BRCA1...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012